-version 0.0
A simple version of convolution.
Input: 2 arrays of 4 elements, width = 4;
Output: convolution array, width = 8;

-version 1.0
Input: 2 arrays of 8 elements, width = 4;
Output: convolution array, width = 8;
Use loop to calculate convolution

-version 2.0
Input: din 
assign din to 2 arrays of 8 elements, width = 8;
Output: dout
write to dout from convolution array, width = 16;
Use loop to calculate convolution.
With read and write

-version 2.1
Remove internal registers from port lists.
Now the source file could be executed without "-verilog"

-version 3.0
Now unpack all the for loops; integrating calculating with writing.
Remove anew, bnew, res registers.

-version 3.1
Replace 32-bit integers with 4-bit registers.


